Skip to content

Conversation

wpessers
Copy link

Which problem is this PR solving?

#2921 No SQS Context propagation in aws lambda.

Short description of the changes

Used pubsub propagation utils to automatically created processing spans for each sqs record within a lambda sqs event. These spans link to their producer span, as is described in the spec: https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/#consumer-spans

@wpessers wpessers requested a review from a team as a code owner August 12, 2025 22:17
@github-actions github-actions bot requested a review from jj22ee August 12, 2025 22:17
@wpessers wpessers marked this pull request as draft August 12, 2025 22:17
@wpessers
Copy link
Author

wpessers commented Aug 12, 2025

Converted this back to draft PR. Some stuff I need to figure out first. Used the sqs instrumentation from the aws sdk instrumentation package as an example but there are a bunch of deprecated semconv attributes.

@wpessers wpessers force-pushed the feat/instrumentation-aws-lambda/sqs-context-propagation branch from e0722a0 to 69c3ef1 Compare August 13, 2025 21:49
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.85%. Comparing base (cc7eff4) to head (1dca665).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2981      +/-   ##
==========================================
+ Coverage   89.83%   89.85%   +0.01%     
==========================================
  Files         188      188              
  Lines        9288     9311      +23     
  Branches     1905     1912       +7     
==========================================
+ Hits         8344     8366      +22     
- Misses        944      945       +1     
Files with missing lines Coverage Δ
.../instrumentation-aws-lambda/src/instrumentation.ts 95.02% <100.00%> (+0.38%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wpessers wpessers marked this pull request as ready for review August 17, 2025 21:58
@wpessers
Copy link
Author

@jj22ee Any thoughts on this PR? There's some use of deprecated semconv attributes for the messaging spans. But these only have alternatives that are still incubating currently. So I believe we would prefer the deprecated ones to avoid possibly breaking on minor version updates for these semconv attributes? Also, there's currently still a bunch of separate commits not adhering to the conventional commits standard. Is it expected that I rebase and squash the commits myself? Or is this done eventually on merge?

@wpessers
Copy link
Author

Also, I tested this by deploying this version of the aws-lambda instrumentation and actually exporting the spans to grafana. Resulting in traces that look like this:
This is the trace of the lambda that is triggered by sqs, thus containing the regular function invocation span as parent. And then a bunch of process spans for each sqs record in this batch. These each have a span link back to the message producer span as you can see.
Screenshot 2025-08-16 223919

If we follow such a span link, we end up at the message producer span:
Screenshot 2025-08-16 223950

@gkaskonas
Copy link

Can this be merged, please?

@jj22ee
Copy link
Contributor

jj22ee commented Aug 28, 2025

Hey, planning to take a look this week.

@wpessers
Copy link
Author

wpessers commented Sep 1, 2025

Hi @jj22ee we've discussed this in the FaaS SIG again. The span linking visualization that I showed above for grafana is not something that is generally supported by other observability platforms. Therefore we want to change this PR a bit so that we can make it configurable to either use the current implementation with span linkgs or set the processing spans' parent context to the producer span instead of span linking. Users can then configure themselves how they want the context propagation to work.

Just so people using different observability platforms can also get the benefits of this feature, and the second approach will make this information visible regardless of which platform you export spans to.
Any thoughts?

@jj22ee
Copy link
Contributor

jj22ee commented Sep 3, 2025

Hi @wpessers

There's some use of deprecated semconv attributes for the messaging spans. But these only have alternatives that are still incubating currently. So I believe we would prefer the deprecated ones to avoid possibly breaking on minor version updates for these semconv attributes?

The workaround that is being done in the contrib repo is to just copy the incubating attributes into the package's semconv.ts file. For example, see this incubating attribute for aws-lambda-instrumentation: https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/packages/instrumentation-aws-lambda/src/semconv.ts

@jj22ee
Copy link
Contributor

jj22ee commented Sep 4, 2025

Regarding the original issue: #2921 (comment)

In aws-lambda instrumentation, we should identify whether an input event contains sqs records. If it does we should do something similar to what the aws-sdk instrumentation is already doing, to link processing spans back to the producing spans by extracting context from the message system attributes.

It was discussed to drop Processing Spans in OTel JS's AWS SDK instrumentation due to implementation limitations and updates to the spec

However, there is a separate spec specifically for handling SQS in AWS Lambda, which is different from the spec you linked in the description:

This one does state that:

The function invocation span MUST correspond to the SQS event, which is the batch of messages. For each message, an additional span SHOULD be created to correspond with the handling of the SQS message

So this PR does follow the Lambda+SQS Spec to create new process spans for each message.

@jj22ee
Copy link
Contributor

jj22ee commented Sep 4, 2025

I advise you to look at the Lambda+SQS spec because the spec details in there differs from this PR (e.g. required usage of AWS X-Ray Propagator and using message system attributes instead of message attributes to check for AWSTraceHeader to generate the context for span link).

Therefore we want to change this PR a bit so that we can make it configurable to either use the current implementation with span linkgs or set the processing spans' parent context to the producer span instead of span linking. Users can then configure themselves how they want the context propagation to work.

I'd recommend to just start with the initial support for process spans for each message w/ span links since that is part of spec. Do you know if the alternative configuration will also be added to the spec? Sounds like it might be complicated because it still make sense for processing spans' parent to be the Lambda Function Span.

@wpessers
Copy link
Author

wpessers commented Sep 4, 2025

It was discussed to drop Processing Spans in OTel JS's AWS SDK instrumentation due to implementation limitations and updates to the spec

Oh, I see. This was apparently ongoing right before I opened this PR, but I somehow missed it.

@wpessers
Copy link
Author

wpessers commented Sep 4, 2025

I advise you to look at the Lambda+SQS spec because the spec details in there differs from this PR (e.g. required usage of AWS X-Ray Propagator and using message system attributes instead of message attributes to check for AWSTraceHeader to generate the context for span link).

That makes sense, will read through it and make the necessary changes.

I'd recommend to just start with the initial support for process spans for each message w/ span links since that is part of spec. Do you know if the alternative configuration will also be added to the spec? Sounds like it might be complicated because it still make sense for processing spans' parent to be the Lambda Function Span.

Sounds good, I think we can indeed just start with the implementation adhering to the spec. I don't know if the alternative configuration will be added to the spec but it's probably worth discussing. Agree, it does indeed make sense for the parent to be the Lambda Function span. It's just a shame that this will not be represented properly in a lot of observability platforms. If we can (after the initial implementation) also add an experimental feature to allow end-users to opt-in to the alternative way of creating the processing spans with the producer span as their parent, then we can properly evaluate whether a spec change would be warranted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants